Using [semi-supervised methods][semi] described in the documentaton. Label propagation basically involves trying to add labels to the test data based on the labels in the training data.


In [1]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 12
plt.rcParams['axes.grid'] = True
plt.set_cmap('brg')


<matplotlib.figure.Figure at 0x7f3f09503588>

In [2]:
cd ..


/home/gavin/repositories/hail-seizure

In [3]:
from python import utils

In [4]:
with open("settings/testing_labelprop.json") as fh:
    settings = utils.json.load(fh)

In [5]:
with open("segmentMetadata.json") as fh:
    meta = utils.json.load(fh)

In [6]:
data = utils.get_data(settings)

In [8]:
da = utils.DataAssembler(settings,data,meta)

Then we just need to build training sets for each subject and apply the relevant models. Unfortunately, the cross-validator doesn't handle test segments so we won't be able to run any informative cross-validation.


In [15]:
import sklearn.ensemble
import sklearn.preprocessing
import sklearn.semi_supervised

In [19]:
scaler = sklearn.preprocessing.StandardScaler()
selector = sklearn.ensemble.ExtraTreesClassifier(n_estimators=1000)
classifier = sklearn.semi_supervised.LabelPropagation()

In [23]:
predictions = {}
for subject in settings['SUBJECTS']:
    print("Processing " +subject)
    Xtrain,ytrain = da.build_training(subject)
    Xtest = da.build_test(subject)
    
    X = np.vstack([Xtrain,Xtest])
    y = np.hstack([ytrain,np.array([-1.0]*Xtest.shape[0])])
    
    print("Fitting ExtraTree feature selection.")
    # then we want to fit preprocess the data
    X = scaler.fit_transform(X)
    selector.fit(Xtrain,ytrain)
    
    print("Applying ExtraTree feature selection.")
    X = selector.transform(X)
    
    print("Fitting classifier.")
    # then fit the classifier
    classifier.fit(X,y)
    
    print("Classifying test data.")
    # then classify the test set
    predictions[subject] = classifier.predict_proba(X)[:Xtrain.shape[0],:]
    
    break


Processing Dog_1
Fitting ExtraTree feature selection.
Applying ExtraTree feature selection.
Fitting classifier.
Classifying test data.
/home/gavin/.local/lib/python3.4/site-packages/sklearn/semi_supervised/label_propagation.py:254: RuntimeWarning: invalid value encountered in true_divide
  self.label_distributions_ /= normalizer

In [24]:
predictions


Out[24]:
{'Dog_1': array([[ nan,  nan],
        [ nan,  nan],
        [ nan,  nan],
        ..., 
        [ nan,  nan],
        [ nan,  nan],
        [ nan,  nan]])}

Unsure why that is happening, could be there is an assumption of the label propagation I'm unaware of that is causing problems.